breast cancer
- South America > Venezuela (0.05)
- North America > United States > South Carolina (0.05)
- North America > United States > New Jersey (0.05)
- (2 more...)
- Research Report > New Finding (0.96)
- Research Report > Experimental Study (0.70)
- Media > News (1.00)
- Government > Regional Government > North America Government > United States Government (0.48)
- Health & Medicine > Therapeutic Area > Oncology > Breast Cancer (0.42)
How Ensemble Learning Balances Accuracy and Overfitting: A Bias-Variance Perspective on Tabular Data
Abstract--Tree-based ensemble methods consistently outperform single models on tabular classification tasks, yet the conditions under which ensembles provide clear advantages--and prevent overfitting despite using high-variance base learners--are not always well understood by practitioners. We study four real-world classification problems (Breast Cancer diagnosis, Heart Disease prediction, Pima Indians Diabetes, and Credit Card Fraud detection) comparing classical single models against nine ensemble methods using five-seed repeated stratified cross-validation with statistical significance testing. Our results reveal three distinct regimes: (i) On nearly linearly separable data (Breast Cancer), well-regularized linear models achieve 97% accuracy with <2% generalization gaps; ensembles match but do not substantially exceed this performance. We systematically quantify dataset complexity through linearity scores, feature correlation, class separability, and noise estimates, explaining why different data regimes favor different model families. Cross-validated train/test accuracy and generalization-gap plots provide simple visual diagnostics for practitioners to assess when ensemble complexity is warranted. Statistical testing confirms that ensemble gains are significant on nonlinear tasks (p < 0.01) but not on near-linear data (p > 0.15). The study provides actionable guidelines for ensemble model selection in high-stakes tabular applications, with full code and reproducible experiments publicly available. A model that almost perfectly fits its training data can still fail badly on new cases. This gap between training performance and real-world behaviour is the essence of overfitting, and it is particularly problematic in domains such as medical diagnosis and financial fraud detection, where mistakes are costly: missed tumours delay treatment, and undetected fraud translates directly into monetary loss.
- North America > United States > Wisconsin (0.04)
- Asia > India (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Health & Medicine > Therapeutic Area > Oncology (0.90)
- Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.60)
Clinical-R1: Empowering Large Language Models for Faithful and Comprehensive Reasoning with Clinical Objective Relative Policy Optimization
Gu, Boyang, Zhou, Hongjian, Segal, Bradley Max, Wu, Jinge, Cao, Zeyu, Zhong, Hantao, Clifton, Lei, Liu, Fenglin, Clifton, David A.
Recent advances in large language models (LLMs) have shown strong reasoning capabilities through large-scale pretraining and post-training reinforcement learning, demonstrated by DeepSeek-R1. However, current post-training methods, such as Grouped Relative Policy Optimization (GRPO), mainly reward correctness, which is not aligned with the multi-dimensional objectives required in high-stakes fields such as medicine, where reasoning must also be faithful and comprehensive. We introduce Clinical-Objective Relative Policy Optimization (CRPO), a scalable, multi-objective, verifiable reinforcement learning method designed to align LLM post-training with clinical reasoning principles. CRPO integrates rule-based and verifiable reward signals that jointly optimize accuracy, faithfulness, and comprehensiveness without relying on human annotation. To demonstrate its effectiveness, we train Clinical-R1-3B, a 3B-parameter model for clinical reasoning. The experiments on three benchmarks demonstrate that our CRPO substantially improves reasoning on truthfulness and completeness over standard GRPO while maintaining comfortable accuracy enhancements. This framework provides a scalable pathway to align LLM reasoning with clinical objectives, enabling safer and more collaborative AI systems for healthcare while also highlighting the potential of multi-objective, verifiable RL methods in post-training scaling of LLMs for medical domains.
- Africa (0.05)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Asia > India (0.04)
- (4 more...)
- Health & Medicine > Diagnostic Medicine (1.00)
- Health & Medicine > Therapeutic Area > Oncology (0.96)
Copula Based Fusion of Clinical and Genomic Machine Learning Risk Scores for Breast Cancer Risk Stratification
Aich, Agnideep, Hewage, Sameera, Murshed, Md Monzur
Clinical and genomic models are both used to predict breast cancer outcomes, but they are often combined using simple linear rules that do not account for how their risk scores relate, especially at the extremes. Using the METABRIC breast cancer cohort, we studied whether directly modeling the joint relationship between clinical and genomic machine learning risk scores could improve risk stratification for 5-year cancer-specific mortality. We created a binary 5-year cancer-death outcome and defined two sets of predictors: a clinical set (demographic, tumor, and treatment variables) and a genomic set (gene-expression $z$-scores). We trained several supervised classifiers, such as Random Forest and XGBoost, and used 5-fold cross-validated predicted probabilities as unbiased risk scores. These scores were converted to pseudo-observations on $(0,1)^2$ to fit Gaussian, Clayton, and Gumbel copulas. Clinical models showed good discrimination (AUC 0.783), while genomic models had moderate performance (AUC 0.681). The joint distribution was best captured by a Gaussian copula (bootstrap $p=0.997$), which suggests a symmetric, moderately strong positive relationship. When we grouped patients based on this relationship, Kaplan-Meier curves showed clear differences: patients who were high-risk in both clinical and genomic scores had much poorer survival than those high-risk in only one set. These results show that copula-based fusion works in real-world cohorts and that considering dependencies between scores can better identify patient subgroups with the worst prognosis.
- North America > United States > New York (0.04)
- North America > United States > Minnesota > Blue Earth County > Mankato (0.04)
- North America > United States > Louisiana > Lafayette Parish > Lafayette (0.04)
- (2 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
- Health & Medicine > Therapeutic Area > Oncology > Breast Cancer (0.83)
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- North America > United States > Massachusetts (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- (4 more...)
- Health & Medicine > Therapeutic Area > Oncology (1.00)
- Health & Medicine > Diagnostic Medicine (1.00)
- Health & Medicine > Therapeutic Area > Gastroenterology (0.94)
- Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.93)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.28)
- North America > Canada > Quebec > Montreal (0.04)
- Europe > United Kingdom > England > Greater London > London (0.04)
Breast density in MRI: an AI-based quantification and relationship to assessment in mammography
Chen, Yaqian, Li, Lin, Gu, Hanxue, Dong, Haoyu, Nguyen, Derek L., Kirk, Allan D., Mazurowski, Maciej A., Hwang, E. Shelley
Mammographic breast density is a well-established risk factor for breast cancer. Recently there has been interest in breast MRI as an adjunct to mammography, as this modality provides an orthogonal and highly quantitative assessment of breast tissue. However, its 3D nature poses analytic challenges related to delineating and aggregating complex structures across slices. Here, we applied an in-house machine-learning algorithm to assess breast density on normal breasts in three MRI datasets. Breast density was consistent across different datasets (0.104 - 0.114). Analysis across different age groups also demonstrated strong consistency across datasets and confirmed a trend of decreasing density with age as reported in previous studies. MR breast density was correlated with mammographic breast density, although some notable differences suggest that certain breast density components are captured only on MRI. Future work will determine how to integrate MR breast density with current tools to improve future breast cancer risk prediction.
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.68)
- Health & Medicine > Therapeutic Area > Oncology > Breast Cancer (1.00)
- Health & Medicine > Diagnostic Medicine > Imaging (1.00)
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- (2 more...)
- Health & Medicine > Therapeutic Area > Oncology (1.00)
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
- North America > Canada > Quebec > Montreal (0.04)
- Europe > United Kingdom > England > Greater London > London (0.04)